Impact of Less Skewed Distributions on Efficiency and Effectiveness of Biomedical Relation Extraction

نویسندگان

  • Md. Faisal Mahbub Chowdhury
  • Alberto Lavelli
چکیده

Like in other NLP tasks, it has been claimed that advances of machine learning (ML) based approaches to relation extraction (RE) are hampered by the imbalanced distribution of positive and negative instances in the annotated training data. Usually, the number of negative instances is much larger than that of the positive ones and such skewness also exists in the test data. In this paper, we aim at addressing the problem of imbalanced distribution by automatically curbing less informative negative instances. We propose some criteria for identifying such instances and incorporate them in an existing state-of-the-art RE approach. Empirical results on 5 benchmark biomedical corpora show that our proposed approach improves both recall and F1 scores. At the same time, there is a large drop in the number of negative instances and in execution runtime as well. Title and Abstract in Italian L’Impatto di Distribuzioni Meno Squilibrate sull’Efficienza e l’Efficacia dell’Estrazione di Relazioni Biomediche Come per altri compiti di Trattamento Automatico del Linguaggio, si è sostenuto che i progressi degli approcci all’estrazione di relazioni basati su apprendimento automatico sono ostacolati dalla distribuzione squilibrata dei casi positivi e negativi nei dati di addestramento annotati. Generalmente, il numero di istanze negative è molto più grande del numero di quelli positivi e tale squilibrio esiste anche nei dati di test. In questo articolo, ci si propone di affrontare il problema della distribuzione squilibrata eliminando automaticamente le istanze negative meno informative. Proponiamo alcuni criteri per individuare tali casi e inserirli in un approccio all’estrazione di relazioni con prestazioni allo stato dell’arte. I risultati empirici su 5 corpora biomedici di riferimento mostrano che l’approccio proposto migliora sia la recall sia il punteggio di F1. Allo stesso tempo, c’è una diminuzione nel numero di istanze negative e anche nel tempo di esecuzione.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effectiveness and Efficiency of Open Relation Extraction

A large number of Open Relation Extraction approaches have been proposed recently, covering a wide range of NLP machinery, from “shallow” (e.g., part-of-speech tagging) to “deep” (e.g., semantic role labeling–SRL). A natural question then is what is the tradeoff between NLP depth (and associated computational cost) versus effectiveness. This paper presents a fair and objective experimental comp...

متن کامل

Impact of Various Beam Parameters on Lateral Scattering in Proton and Carbon-ion Therapy

Background: In radiation therapy with ion beams, lateral distributions of absorbed dose in the tissue are important. Heavy ion therapy, such as carbon-ion therapy, is a novel technique of high-precision external radiotherapy which has advantages over proton therapy in terms of dose locality and biological effectiveness.Methods: In this study, we used Monte Carlo method-based Geant4 toolkit to s...

متن کامل

Using Weighted Distributions for Modeling‎ Skewed‎, ‎Multimodal and Truncated Data‎

When the observations reflect a multimodal‎, ‎asymmetric or truncated construction or a combination of them‎, ‎using usual unimodal and symmetric distributions leads to misleading results‎. ‎Therefore‎, ‎distributions with ability of modeling skewness‎, ‎multimodality and truncation have been in the core of interest in statistical literature‎, ‎always‎. ‎There are different methods to contract ...

متن کامل

The Family of Scale-Mixture of Skew-Normal Distributions and Its Application in Bayesian Nonlinear Regression Models

In previous studies on fitting non-linear regression models with the symmetric structure the normality is usually assumed in the analysis of data. This choice may be inappropriate when the distribution of residual terms is asymmetric. Recently, the family of scale-mixture of skew-normal distributions is the main concern of many researchers. This family includes several skewed and heavy-tailed d...

متن کامل

Economic design of x¯ control charts considering process shift distributions

Process shift is an important input parameter in the economic design of control charts. Earlier x control chart designs considered constant shifts to occur in the mean of the process for a given assignable cause. This assumption has been criticized by many researchers since it may not be realistic to produce a constant shift whenever an assignable cause occurs. To overcome this difficulty...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012